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8.0 INTRODUCTION 


In Unit 7, Recurrence Relations* we gave an introduction to the things we 
have to consider when we try to find numerical answers to general prob- 
lems, and illustrated these considerations, and their analysis, with respect 
to computations involving linear recurrence relations. 


In this unit we look from the same point of view at an even more im- 
portant and ubiquitous problem, the solution of simultaneous linear 
algebraic equations. As before, we are concerned with two main aspects 
of stability: 


(i) Can the problem exhibit inherent instability (ill-conditioning), and how 
would we recognize this defect? 

(ii) Can some methods exhibit induced instability, and can we find methods 
which avoid it? 


These topics have already had some previous introductory attention. For 
example, in Unit M100 28, Linear Algebra IV we looked at the possibilities 
of ill-conditioning, observing that for the case of two equations the ill- 
conditioning is related to the difficulty of finding accurately the intersection 
of two nearly parallel lines in a plane. We also noticed that the ill-condi- 
tioning in general is related to the near-singularity of the matrix of the 
equations, and indeed recommended the computation of the inverse with 
a view to revealing the possibility of inherent instability. 


We did not investigate the stability of methods in Unit M100 28, but we 
did give some measure of the relative efficiency of a few methods in relation 
to the amount of arithmetic involved. Suppose we have a linear problem 
in the form 


Ax =b, 


where A is a real square matrix of order n, b a known one-column matrix 
representing a vector in R", and x the one-column matrix representing the 
required solution vector with elements x,, x2, ..., X,- There is a formal 
“ mathematical” solution known as Cramer’s rule, given by 


„=b, r=1,2,...,n, 


where A is the determinant of the matrix A, and A, is the determinant of 
the matrix which is obtained by replacing the rth column of A by the 
column b. This method we rightly cast out because the amount of work 
involved is prohibitively large. 


On the other hand, a process of elimination, invented by Gauss and 
closely related to the calculation of the Hermite normal form (see Unit 3, 
Hermite Normal Form), turned out to be an efficient method, at least with 
respect to the volume of computation involved. 


In this unit we take up these questions in more detail. Following the pattern 
of Unit 7, Recurrence Relations, we first give, in Section 8.1,some examples 
of ill-conditioning for a physical problem, in which at least some of the 
data are uncertain (due perhaps to inaccurate measurement or ignorance), 
and for a mathematical problem in which the data, though exact, cannot 
be stored exactly in our computing machine. Since ill-conditioning (inherent 
instability) is a function only of the problem, all the examples at this stage 
are treated with exact arithmetic, so that the ill-conditioning is separated 
from the (induced instability) effects of rounding errors in the computation. 


In sub-section 8.1.3, we emphasize the fact that ill-conditioning is a pro- 
perty of the problem, not the method, by illustrating the curious fact that 


* When referring to Unit 7, we quote the short version of its title only. 
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while the solutions x,, x2, ..-, X, of the linear equations may be uncertain 
by relatively large amounts in respect of small changes (uncertainties) in 
the data, it might well happen that some not very different problem may 
be relatively well-conditioned. Such a problem, for example, which arises 
quite frequently in practice, is the evaluation of the single number 


YHCyxXy + CX + ore oO hes 


(Xis X2, +++» Xn) being the solution vector of Ax = b, for specified numbers 
C1, Ca +++) Cy For some cs, the number y can be quite well determined, 
because the possibly large uncertainties in the xs are not independent. 


In Section 8.2 we examine the most popular and useful direct method, the 
Gauss elimination method, and its variant attributed to Jordan, and we 
note their relation with our work in Unit 3 on the production of the Hermite 
normal form. With exact arithmetic this gives the unique exact solution if 
A is non-singular, or reveals any singularity if it exists. We also show (in 
sub-section 8.2.4) that the method can be expressed in terms of an im- 
portant and fundamental matrix theorem concerned with the expression 
of a matrix as the product of two simpler (triangular) matrices. 


Having established the method, we then examine its performance, investi- 
gating whether or not it can exhibit induced instability when we have to 
use inexact computer arithmetic. Sub-section 8.3.1 discusses an example in 
which induced instability occurs, and the reason for this is analysed in 
general terms in sub-section 8.3.2. The analysis is constructive, in that it 
shows the possibility of making small changes in the basic method which 
virtually eliminate the induced instability. This stable variant is described 
in sub-section 8.3.3. 


The error analysis, called backward error analysis, which we use in the 
analysis described above, unfortunately does not give any bound for the 
error of our computed solution, so that we have to discuss this separately. 
In Section 8.4, we show how to correct an approximate solution with a 
small amount of extra work, and this itself answers the question about the 
accuracy of the first approximation. It also partially answers the question, 
so far unexplored, of how we most easily detect any ill-conditioning in the 
original problem. 


The methods discussed in this unit are called direct methods, since if every- 
thing is “exact” the solutions are obtained exactly in a finite number of 
operations. 


In another class of methods, called iterative methods, we never get the exact 
solutions, but approach them more and more closely in a sequence of 
operations, which is terminated when our approximation is sufficiently 
accurate in some particular context. Iterative methods were introduced 
very briefly in Unit M100 28. We do not include them here because, 
though they are very important, their main application is in the treat- 
ment of matrices of rather special types, for example, those arising in the 
numerical solution of partial differential equations. 


8.1 ILL-CONDITIONING OF THE LINEAR 
EQUATION PROBLEM 


8.1.1 A Physical Problem 


As usual in our numerical work, we look for possibilities of ill-conditioning 
in any given problem. We begin by illustrating the possibility with a simple 
example, Suppose we have the physical problem 


$x, +4. + 4x3 =b, +e, 
ox, + $2 + 3x3 = ba + ep 
4x, + dx, + ixa = ba + e3 


in which the coefficients are known exactly, whereas the numbers b, are i 


subject to measurement errors of amounts e,, and suppose |e,| < 107? 
for r = 1, 2, 3. 
In terms of matrices, we may write our physical problem as 

Ax=b+e 


and whatever the values of the b,, the contribution to the solution from 
the uncertainties e, is A“ te. 


A simple computation (see Unit M100 26, Linear Algebra III) gives 


72 —240 180 
A-+=|-240 900 -—720 
180 —720 600 


By inspecting the signs of the elements of A~1, we see that the maximum 
effect of the e, occurs when e, = +107, e, = F107? and e3 = + 1073, 
and that the solution x has possible uncertainties of amounts +0.492, 
¥ 1.860, +1.500. 


These uncertainties are very much larger than those of the data, and we 
have a severe case of absolute inherent instability (ill-conditioning). Does 
this matter? What is the relative effect, which may not be serious if the 
“true” solution x = 4~"b is also large? . 


Well, if b, = 1, b; = — 1, b, = 1, then 
x, = 492, x, = —1860, x, = 1500. 


The relative uncertainty in the answer is precisely the same as that of the 
data. So with b, = 1, b} =—1, b, = 1, this problem is not ill-conditioned 
in a relative sense. 


Unfortunately there are some right-hand sides, of comparable magnitude 
with those of the previous example, for which the true solution x is quite 
small. For example, with b, = 0.95, b, = 0.67, b, = 0.52, we find 


x, = 1.2, x, = 0.6, x3 = 0.6. 


With the same uncertainty |e,| < 107° in the right-hand sides, all we can 
say about our solution is that 


x, = 1.20 + 0.492, x, = 0.60 F 1.860, 
x3 = 0.60 + 1.500. 


Here we have a very ill-conditioned situation. Not a single significant 
figure in the answer is “meaningful”, that is, can be quoted as being 
certainly correct for all uncertainties in the data within the known limits! 
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Exercises ‘ 


1. Consider the matrix 


gee! 


and suppose that we want to solve the equations 
Ax=b+e, 

where 
le,| <0.5 x 1077. 


Show that the problem is absolutely ill-conditioned. 


2. Find a vector b for which the problem of Exercise 1 is relatively well- 
conditioned, and another vector b, with elements of much the same 
“size”, for which the problem is relatively ill-conditioned. 


Solutions 
1. We find 


-;_[ 180 —210 
a = [20 252) 


i and the maximum uncertainty in the solution, which is A~ +e, 
occurs when e, = +0.5 x 1077, e, = F0.5 x 107, that is, 
the uncertainties in the data e, and e, have their maximum 
values with different signs. We find A~‘e, with these values 
for e, and ez, has elements +1.95 and $2.31. These are the 
uncertainties in the respective solutions x, and x,. They are 
very much larger than e, and e,, and the problem is, there- 
fore, absolutely ill-conditioned. 


2. An example is given by b, = 1, b, = —1, for then the values 
of x, and x, are 390 and — 462, and the maximum values of 
the uncertainties in x, and x, are only 0.5% of the values of 
x, and xz. This is the same ratio as that of the e, to the b,, 
and the problem is relatively well-conditioned. 


Relative ill-conditioning occurs when A~ te has relatively large 
importance. We then want b, and b, to be such that both 
180 b, — 210 b, and —210 b, + 252 b, are small for values of 
b, and b, of the order of unity. Inspection shows that b, = 1, 
b, = $ is one such pair, which gives 


x, =04 1.95, x, = 6.0 F 2.31, 
with severe relative ill-conditioning. 


A little further experiment gives, for b, = 1, b, = $p for ex- 
ample, the results 


x, =2.5 + 1.95, x, = 3.0 F 2.31, 


and the ill-conditioning is here even more pronounced. 


8.1.2 A Mathematical Problem 


Consider now a similar example of a mathematical problem, in which all 
the data are exact but not capable of exact storage. We know from pre- 
vious work that if the physical problem is ill-conditioned, then it is prob- 
ably difficult to get an accurate solution (which here is meaningful) to the 
mathematical problem. Suppose that in the problem posed in the first 
paragraph of sub-section 8.1.1 the right-hand sides are exactly 0.95, 0.67 
and 0.52, so that the exact solution is x, = 1.2, x, = 0.6, x; = 0.6, but 
that we have only a three-digit machine so that the left-hand coefficients 
cannot be stored exactly. We then effectively solve the slightly “ perturbed ” 
problem 


0.500x, + 0.333x2 + 0.250x, = 0.95 
0.333x, + 0.250x2 + 0.200x, = 0.67 
0.250x, + 0.200x, + 0.167x3 = 0.52 


The solution of this problem, avoiding any further arithmetic errors (which 
our poor old machine would almost certainly make) is 


xX, = 1.114, x, = 0.937, x, = 0.324, 
correctly rounded to three decimal places. It is far from the “truth”, and 


we should clearly need to store the data far more accurately to produce a 
tolerably accurate solution. 


These results confirm the expectations stimulated by our previous ex- 
perience in Unit 7, Recurrence Relations, 


We repeat that what we have illustrated is inherent instability or ill-con- 
ditioning; it is a function only of the problem. The only errors were 
“physical” uncertainty in the data or “mathematical” errors in the 
storage of exact data. The method of solution was quite immaterial, and 
in fact we used exact methods on inexact data. Induced instability was 
nowhere in evidence. 


Although we have indicated that ill-conditioning may exist we have not 
indicated how we might detect or analyse this situation. This we defer 
until sub-section 8.4.2, after we have found a stable method of solution. 


8.1.3 Effect of Correlated Uncertainties in the Solution 


Another important point, not made explicitly in Unit 7, is that it is non- 
sense to talk of a recurrence relation as being ill-conditioned. It all depends 
on what we are trying to do with it. In Unit 7, for example, we observed 
that the problem defined by the recurrence relation 


Mer =l—ry,, 
and associated condition yọ = 1 — e~', is certainly ill-conditioned for the 
determination of y;, y2, ..., but the same recurrence relation with con- 


dition y, = 0 is perfectly well-conditioned for the determination Of Ya-ts 
Yn-25 +++ Vis Yo- 
In a similar way we cannot talk of an ill-conditioned matrix, except in 
particular contexts, such as the solution of a particular set of linear equa- 
tions, the inversion of the matrix, or the determination of its eigenvalues 
and eigenvectors. 


It can even happen that the solution of a particular set of linear equations 


is ill-conditioned, whereas some linear combination of the components of 


the solution is reasonably well-conditioned, and this may be the number 
we are trying to find. The point here is that the errors in the individual 
components of the solution are not independent, and some linear com- 
bination may cause some cancellation of the errors. 
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Consider, for example, the problem in sub-section 8.1.1 of finding the 
solution of the equations ‘ 


4x, + 4x, + 4x3 = 0.95 + €, 
4x, + 4x2 + 4x3 = 0.67 + ez 
4x, + 4x, + $x3 = 0.52 + e3 


where |e,| < 1073, and where we obtained the extremely ill-conditioned 
results 


x, = 1.20 + 0.492, x, = 0.60 F 1.860, 
xa = 0.60 + 1.500. 


Suppose, however, that our problem is not the computation of the x, but 
the determination of a quantity 


Y = CX, + Cp Xz + C3X3,5 


for some given numbers c,. There are, in fact, some numbers c, for which 
this problem is not badly conditioned. Let us try to find some. 


The vector x of the solution is 


72 —240 180] [0.95 + e, 
x=A"!(b+e)= |—240 900 —720| [0.67 + ez 
180 —720 600] [0.52 +e, 


i 1.20 72 —240 180] fe, 
=|0.60]+]-240 900 —720| |e, 
0.60. 180 -720 600] le, 


The required value of y is then 


1.20 72 —240 180] fe; 
y=[c, cz ¢3)( | 0.60] + | —240 900 -—720] |e, 
0.60 180 —720 600} Les 


= 1.20¢, + 0.60c, + 0.60c3 + de, 
where the row-vector d = [d, d} d3] has components 


d, = 72c, — 240c, + 180c3 

d, = —240c, + 900c, — 720c, 

d, = 180c, — 720c, + 600c; . 
All we require is that d}, d}, and d, should be small for “reasonable” 
values of the ¢,, even though A~* has large elements. By inspection we 
see that c; = 1, c3 = 1, c} = 1 will give d, which are quite small relative to 


the elements of A~?. In fact, dı = +12, d, = —60, d, = +60, and the 
maximum possible value of |de], with |e,] < 107%, is 132 x 1073 = 0,132. 


We have therefore produced the result 


y = 2.400 + 0.132, 


and the statement y = 2.4 can be made with the confident assertion that 
its maximum possible relative uncertainty is less than 6%, a quite satis- 
factory result in many physical problems. 


Exercise 
Consider the equations 
tx, thx, =] +e, 
$x, + 3x2 =H + eg 
where |e,| <0.5 x 1077, Je,| <0.5 x 1077. 


We saw in Exercise 2 of sub-section 8.1.1 that all we can say about the 
solution is 


x, = 2.5 + 1.95, x. = 3.0 F 2.31, 
a very ill-conditioned situation with not a single meaningful figure. 


Suppose, however, that our problem requires the value not of x, and x, 
but of y = x, + 0.8x,. Show that 


y=4.9 + 0.102, 


a reasonably well-conditioned situation. 


Solution 


From the computation of the inverse in Exercise 1 of sub-section 
8.1.1, we have ` 


ye 180 —210][ 1+e, 
~ -210  252||H +e: 
_ [2.5 g 180e, — 210e, 
~ [3.0 —210e, + 252e, 


E 2.5 180e; — 210e; 
var 0.83/55] a 0.83] e Pa] 


=4.9 + 12e, — 8.4e3. 


The worst contribution from the uncertainties is 


+ [12(0.5 x 1072) — 8.4(—0.5 x 1072)] = +0.102. 


8.1.4 Summary of Section 8.1 


In this section and the introduction we defined the terms 


direct method (page C 6) rex 
iterative method (page C 6) ae 
physical problem (page C 7) ae 
mathematical problem (page C 9) ** 

Techniques 

1. Solve Ax = b, given specified errors in A or b and hence say whether * 
or not the problem is ill-conditioned. 

2. Ifsuch a problem is ill-conditioned, determine whether the problem of * 


finding } $=; Cs Xs, given Cy, C2, .-+, Cys is well-conditioned. 
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8.2 DIRECT METHODS 
8.2.1 Gauss Elimination 


Turning now to a direct method for solving simultaneous equations, we 
shall first describe the elimination method of Gauss in its basic form. For 
simplicity, we deal in detail with three or four equations in three or four 
unknowns, considering Ax = b but recording only the elements of A and 
b in an array like: 


A b 
a A2 443 bi 
a21 22 23 b2 
a31 Q32 433 bz 


The method has two essential parts. 
(i) Elimination 


Evaluate m3, = —4@2,/a,,, and add mz, times the first row of the array to 
the second row to produce a new second row with first element zero. 
This is possible if a,, # 0. 


Evaluate #3; = —a@3,/a,,, and perform the corresponding operation with 
the third row. Again, this is possible if a,, # 0. 


The resulting array looks like: 


A) p® 
Q Q2 A3 b, 
(2 2 2 
0 a? af LD 
2 2 2 
0 a aR bp 


where af) = a2) + mz; 4,2, etc. 


Now ignore the first row, form m3, = —a%}/a¥), and add ma. times the 
second row to the third to eliminate the element a2). This is possible if 


32° 
aQ) #0. The resulting and final array looks like: 


AO =U bd =c 
a a2 i3 bı 
o aR aR aD 
0 0 a bY 


Here U is an upper triangular matrix,* and we have “reduced” the equa- 
tions Ax = b to the equivalent form Ux =e. 


This is the “elimination”? part of the algorithm, and the general method 
for n equations should be clear from this description. The required solu- 


tion is obtained in the second part of the algorithm, the so-called “ back- 
substitution ”. 


(ii) Back-substitution 
Compute x, = b'/aQ), which is possible if a3) 4 0. Substitute for x3 in 
the second row of Ux = ¢ and compute 


x2 = (6D — aQx,)/aQ?, 


which is possible if af} #0. Substitute finally in the first equation to 


produce x,, which is possible if a}, # 0. This completes the back- 


) s x substi- 
tution and we have obtained the required answers. 


* U is also referred to as an upper triangle. 
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Example 
As an example, let us solve the equations 
Xi — 3x2 + 2x; = — 12 
X,+2x,+ x3= 5 
=X; —-3x,-3x,;= —4, 


The whole of the arithmetic appears as follows. 


A b 

1 -3 2 -12 

m,=-1 1 2 1 5 
M3,= 1 =- -3 -3 -4 

AQ) p?) 

1 -3 2 —12 

0 5 -1 17 

M32 = $ 0 -6 -1 —16 
AV =U bO=¢ 

1 -3 2 -12 

0 5 -1 17 

0 0 -t 22, 


This completes the elimination. 

For the back-substitution, from the last row of Ux =e we obtain 
— hs = 42, 

which gives x, = ~2. 

From the second row, 5x, — x3 = 17, i.e. 5x2 = 15, which gives 
X2 = 3, 

and from the first row, 
xX, — 3x, + 2x3 = — 12, 

which gives 
x,=1. 


Notice a connection between this method and the calculation of the 
Hermite normal form discussed in Unit 3. In the latter process, after the 
production of A) we should not only eliminate the a) term using the 
multiplier 1732, but we should also eliminate the a‘? term (which is so 
far just the original a,), by adding a multiple m2 of row 2 of A®) to the 


first row of A®. With obvious further operations of this kind the arrays 
have the following appearance. 


« 


A b 
1 -3 2 —12 
m, = —1 1 2 l 5 
m= 1 -1 -3 -3 -4 
AQ b» 
m= 4 1 -3 2 -12 
0 5 =l 17 
M3, = $ 0 -6 -l —16 
io) b9 
m3 = ay 1 0 ¢ =f 
m3=-7 0 5 ae 17 
o 0 = 22 
Aw pb 
1 0 0 1 
0 5 0 15 
0 o -4 32 


Finally, dividing by the diagonal elements we have the Hermite normal 


form 
: 


AG) po 
1 0 0 1 
0 1 0 3 
0 0 1 -2 


and the solution of the linear equations appears in the last column. The 
reduction of A to Hermite normal form produces essentially a diagonal 
matrix (and the corresponding elimination method is called the Jordan 
method), for which no back-substitution is needed. 


In practice we prefer the Gauss method because it involves fewer numerical 
operations. For one set of linear equations with a matrix of order n, it is 
known that the Gauss method requires the computation of n reciprocals, 
$n? + n? — 4n multiplications, and 47? + 4x? — $n additions. The Jordan 
method, on the other hand, requires reciprocals, 4n? + n? — 4n multi- 
plications, and 4n? — 4n additions. In calculating the number of opera- 
tions involved in the two methods, we have assumed that all possible 
economies are made. For example, in computing the multiplier m3, , say, 
we first evaluate 1/a¥) and retain this number for final use in the back- 
substitution, or at the stage of producing b© in the Jordan method. 


In both methods, of course, there are minor variations. For example, we 
could divide all the elements of a relevant row of A“ and b by the 
diagonal element before performing the elimination. At the stage A®, 
b? of our illustrative example, we had 


AQ) p2 
1 -3 2 -12 
0 5 -l 17 
0 -6 -I —16 


which we could replace by 


AW be 
1 —3 2 -2 
o 1 ~4 3 
0 -6 -I =16 = 


and perform the elimination with the multipliers given immediately as the 
other elements in the second column, below the second row in the Gauss 


method and both above and below in the Jordan method. 


This is the technique actually used in our work on the Hermite normal 
form, but in practice it involves more storage and more arithmetic (and 
therefore more rounding errors which we shall have to consider in due 
course). Experience and simple counting of the number of numerical 
Operations reveal that our first illustrated method is much better in these 


respects. 


Exercise 


Use the Gauss method to solve the equations Ax = b, where 


4 =? 3 2 
A=|-2 4 2| andb=|-4 
3-1 2 3 


Solution 


Laying out the calculation as in the previously worked example, 


we get 
A b 
4 -2 3 2 
Mg, =} -2 4 2 —4 
m3, = -4 3 =l 2 3 
Ae) pe 
4 -2 3 2 
0 3 i -3 
m; = —4 0 ł -4 3 
AM =U bd =c 
4 -2 3 2 
0 3 4 -3 
0 0 -$ 2 
We then apply back-substitution to Ux =e: 
-5x =2, y= 
3x2 + 3x3 = —3, 3x, = ¥, x,=2 
L 


4x, — 2x3 + 3x3 = 2, 4x, = $$, x, = 1S 


Matrix Inversion 
If A is a non-singular 3 x 3 matrix, then the matrix X in 


AX=I 
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is the inverse A~' of A. We can rewrite this equation in the form 
Ax) x?) xB] = fie? i) iM] 


where x), x and x are the columns of X and i, i, i© are the 
columns of Z. Thus, we can find 47! by solving the three sets of three 
simultaneous equations: 


AX =i — (r= 1, 2,3) 


and we can apply the Gauss method above. We do not, however, solve the 
three sets of equations singly, but we reduce A to upper triangular form 
as before, performing the same operations on three right-hand sides, which 
are the columns of the unit matrix, and then carrying out three separate 
back-substitutions. The full computation for finding the inverse of 


1 ~3 2 
A= 1 2 1| is as follows: 
-1 -3 -3 


Elimination 


1 -3 2 1 0 0 
my, = 1 k 2 i o 1 0 
m= 1 =f. =3 =3 0 0 1 
1 -3 2 1 0 0 
0 5 -l -1 l 0 
© Mp=$ 0 -6 -! 0 1 
1 -3 2 1 0 0 
0 5 -l -1 1 0 
0 o -4  -F $ 1 
aq) B (3) 
Back-substitution 
Taking columns (1), (2) and (3) in turn yields: 
O XP aah P= a xP 
D Pea a Pat 
O 29 = -r P = a Pay 


The inverse has columns x“), x and x®) and is therefore 


3°15 7 
At=j]-2 1 -1 


1 -6 -5 
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8.2.2 The Jordan Method 


The Jordan method for inversion, essentially the same as the production 
of the Hermite normal form, avoids back-substitution. When the matrix 
on the left has been transformed to the unit matrix, then that on the right 
is the inverse of the original matrix. The full computation for finding the 
inverse of 


is as follows: 


1 -3 2 1 0 0 
mı = —1 1 2 1 0 1 0 
M3, =1 -1 -3 -3 0 0 i 
Mm, = 2 1 -3 2 1 0 0 

0 5 =l -l 1 0 
M32 = £ 0 ~6 -1 0 1 
m3 = ry 0 0 3 $ 3 0 
ma = -Ñ 0 5 -I -1 1 0 

0 o= =F $o 

1 0 0 wT H pad 

0 5 0 -# f+ - 

0 0 -44 -+ $ 1 

1 0 0 sx usa 

o 1 =a otro 

0 0 a -6 -$ 


This is the method already introduced in Unit 3. 


An interesting fact is that for inversion our descriptions of the Gauss and 
Jordan (Hermite normal form) methods involve identical amounts of 
arithmetic: n reciprocals, n*?—1 multiplications and n? —2n? +n 
additions. 
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8.2.3 Essential Row Interchanges 


In the Gauss elimination process we reduced A to U by adding multiples 
of the first row to all the others to reduce to zero the column | elements 
in rows 2, 3, ..., 7. In this process the element in the first row and column 
is called the pivot, and the process is possible only if the pivot is non-zero. 
After the first reduction to A’, we continue with the new aS? as pivot, 
to eliminate the new column 2 elements in rows 3, 4, ..., 7, and the original 
pivotal rowis unchanged. In other words, we take pivots down the diagonal, 
reducing A to U, and this is possible if none of the pivots is zero. 


The process fails at the first step if a,, is zero, and we must then use some 
other row as pivotal row with its corresponding first element as pivot. 
It is convenient to interchange the first row of A with the pivotal row, 
and reduce this ‘‘row-permuted” A to the corresponding form A®). If 
the new a$} is zero, we look for a non-zero column 2 element in rows 3 
to n, and interchange the relevant rows. A simple example will illustrate 
this, concentrating on the reduction of 4 to an upper triangular matrix U. 


A 


KERN 
Hore 


0 1 
1 1 
2 2 
1 2 


Since a,, = 0, we must use some other row as first pivotal row. Any one 
will do, so we interchange the first two, to produce 


AW 


-non 
BRED 
Kore 


and perform the elimination (only two rows needing attention) to obtain 


AD 
1 1 1 
0 1 1 1 
0 o0 =2 
0 l =l 0 


z ae x o 
We can use a{}) as pivot; there is only the a) term to eliminate, and we 
obtain 


A®) 
1 1 2 1 
0 1 1 1 
0 0 0 -2 
0 0 -2 -i 


We cannot now use the zero a¥) as pivot, and we must interchange rows 
3 and 4. No further elimination is needed, and we produce the upper 
triangular form 


1 1 2 1 

0 1 1 1 

0 0 -2 -1 

0 0 0 -2 
Exercises 


l. In the last example we could at the start have used the third row 
as pivotal row, interchanging it with the first. Carry out the relevant 


arithmetic to produce an upper triangular form, making subsequent row 
interchanges as necessary. 

2. Using the upper triangular form produced in the text immediately 
preceding Exercise 1, show that det A = 4. Verify this by considering the 
triangular form obtained in Exercise 1. 

3. Suppose that in Exercise 1 we interchange rows | and 3 of A, then 
interchange rows 2 and 3 of the new matrix, and finally rows 3 and 4 of 
this matrix, before performing any elimination. Show that the pivots can 


now be taken on the diagonal and that the computed U is that quoted in 
Exercise 1. 


Solutions 


1. The first array becomes 


2 2 4 0 
Mz, = —4$ I 1 2 1 
™m3,;= 0 0 1 1 1 
Mg, = ~} 1 2 1 1 
and elimination produces 
2 2 4 0 
0 0 0 1 
0 1 1 1 
0 1 =l 1 


We cannot use a§?) as pivot, but a is a possibility. Inter- 
changing rows 2 and 3 gives 


2 2 4 0 
0 1 1 1 
0 0 90 1 
0 1 -l 1 


and at this stage there is just one elimination to perform. We 
find the new matrix 


2 2 4 0 
0 1 1 1 
0 0 0 i 
0 0 -2 0 


Now a{} is not a possible pivot, but a§} is possible, and we 
therefore interchange rows 3 and 4, find no further elimination 
necessary, and record the final upper triangular form 


U 
2 2 4 0 
0 1 ] 1 
0 oO -2 0 
0 0 o0 1 


(Note that after the first elimination it would have been 
possible to use a?) as pivot. This would give a different final 
upper triangular U.) 

2. U from the text has been produced from A by two row 
interchanges, each of which changes the sign of det A, together 
with addition of multiples of some rows to others, each of 
which does not change the sign. Hence 


det A = (— 1} det U 
= product of diagonal terms of U. 


Similarly, the U of Exercise | was produced from A in a 
way that involved three row interchanges, and det U = —4. 
Thus, both methods give det A = 4. 
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3. The first interchange (of rows | and 3 of A) gives 
2 2 0 

f l 

1 i 

1 


Sona 


I 
0 
l 2 


The next interchange, of rows 2 and 3 of this matrix, gives 


2 2 4 0 
0 1 1 1 
1 1 2 1 
1 2 if 1 
and the final interchange, of rows 3 and 4 of this matrix, 
gives 
2 2 4 0 
0 1 1 1 
1 2 1 1 
1 1 2 1 
The standard elimination gives successive matrices 
Aw 
2 2 4 0 
miı= 0 0 1 l l 
i m3, = —$ 1 2 1 1 
Mg, = —4 I 1 2 1 
PGA 
2 2 4 #0 
0 l 1 1 
M3, = —1 0 1 -l 1 
m= 0 0 0 0 1 
and the final form is 
AM =U 
2 2 4 0 
0 1 1 1 
0 0-2 0 
0 0 0 1 


Such interchanges as those above were used in the production of the 
Hermite normal form. We note, here as well as in the work on the Hermite 
normal form, that if at any stage there is no possible candidate for pivot 
then the matrix is singular. One might wish to do something else (such as 
to continue with the production of the Hermite normal form), but so far 
as the solution of linear equations is concerned we know that we cannot 
then find a unique solution and that there may not be a solution at all. 
This is somewhat unlikely in practical work and we shall not deal with it 
in this unit. 


The interchanges above were essential, since without them we could not 
perform the relevant elimination. When we perform an error analysis of 
the method, to detect the possibility of induced instability, and then look to 
see if we can produce a more stable method, we shall learn the interesting 
fact that selected non-essential interchanges become very important and are 
in practice necessary to produce a guaranteed stable method. 


We shall consider this in Section 8.3, but for the rest of this section we 
examine the matrix equivalent of our elimination method, discovering in 
the process that the same technique can be considered in a completely 
different way in which the idea of elimination never appears explicitly. 
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8.2.4 Matrix Equivalent of Gauss Elimination: 
Matrix Decomposition 


We know from previous work (Unit 3, for example) that all the operations 
performed in reducing A to upper triangular form U can be expressed in 
terms of matrix operations on A. Suppose, for example, that we can take 
pivots down the diagonal. Then it can easily be seen, using a 3 x 3 matrix 
for illustration as usual, that we first form 


J, A 


a 1 0 Ofan az ays 
AM =J], A= |m; 1 Of fan, anz a3 
m3, 0 I} las, a az 


g 


where mz; = —a3;/4;1, M31 = —a3,/a,,. The premultiplying matrix J, is 
lower triangular with unit elements on the diagonal. Such a matrix is unit 
lower triangular. 


At the next and final stage in our example, we have 


1 0 0 
A®=U=J, J,A, Ja=|0 1 0 
0 ma 1 


where 1713, = —a¥/a¥). 

This extends in an obvious way to the general case: the result is 
Jeila C SgJ,A =U, 

where U is upper triangular and each J, is unit lower triangular. 


The product of (unit) lower triangular matrices is (unit) lower triangular, 
and the inverse of a (unit) lower triangular matrix is also (unit) lower 
triangular (see the Exercise below), and it follows that we can regard the 
elimination process as the equivalent of the triangular decomposition of a 
matrix in the form 


A=LU, 

where L is a unit lower triangle and U an upper triangle. 

It is of interest to determine the nature of L in terms of the J,. We have 
L= (Jri Spaz Ja Jy)? = ST Sz! sie i 


For the case of order three we easily verify that 


1 0 0 
Jīt=|-ma 1 0j, 
—m3, 0 1 


which is identical with J, except for sign changes in the off-diagonal 
elements. Similarly 


1 0 0 
Jgi= 10 1, 0 
0 =m; 1 


Finally, we find that the product L = JĮ * Jz‘ has the interesting form 


1 0 0 
L= |-m,; 1 0j, 
—M3,; —M3, | 


none of the m,, being multiplied together! 


It follows that, in the decomposition 4 = LU, the U matrix is the form 
obtained from the elimination process, and the L matrix is formed from 
the negatives of the multipliers in the elimination process. 
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Exercise 
‘ 
Verify for 3 x 3 matrices the statements made above: 


(i) the product of (unit) lower triangular matrices is (unit) lower tri- 
angular; 

(ii) the inverse of a (unit) lower triangular matrix is also (unit) lower 
triangular. 


dG) mogu 0 0 1 0 oO 
a 1 Oj]? 1 OJ= a+l 1 0 
b c ljimani 1 b+cl+m c+n |1 


(ii) Employing the method of Unit 3, with the notation of this 


Solution 


unit: 

1 0 0 1 0 0 

mı = —a a ] 0 0 1 0 

mz, = —b b c 1 0 0 1 

1 0o o0 1 0 0 

0 1 0 —a 1 0 

M32 = -c 0 ec 1 -b 0 1 

1 0 0 1 0 0 

; 0 1 0 —a 1 0 
0 0 1 ac—b -c 1 


8.2.5 Compact Elimination 


The discussion in the previous sub-section gives us a method for solving 
linear equations which in explicit terms has nothing to do with elimination. 
Consider, for example, a 3 x 3 case and write 


L U A 


1 0 Ofii z t3 an M2 3 
fy 1 OF] 0 u2 uz3|= [|an azn a3 
fs, d2 IJLO 0 u3 43; 432 433 


By equating the product of the rth row, L(r), of L and sth column, U(s), 
of U with the element a,, we can calculate successively the elements of L 
and U. If L(r)U(s) denotes the product of the rth row of L and the sth 
column of U, we have 


LOUC) = u; =a; 
L(1)U(2) = u2 = 442 
L(1)U(3) = u13 = 443 
L(2)U(1) = laity =.@24 


whence 
hy = Agy/ttys, 
if uj) = ay, #0. 
L(2)U(2) = lyyy2 + 22 = a22 
whence 
u2 = 422 — h; X yp 
which gives 133, and so on. 


The computations are performed to produce, in order, the first row of U, 
second row of L, second row of U, and so on. 
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Exercise 
Find the triangular decomposition of 
1-3 2 
A= 1 2 1 
-1 -3 -3 


and compare the results with those obtained for A by the Gauss elimination 
method in sub-section 8.2.1. 


Solution 
Let 
1 -3 2 1 0O Offuy, uiz ug 
1 2 T}=]h, 1 O||O uz uz 
-1 -3 -3 hi ha LILO O tz 
Then ordinary matrix multiplication gives us 1 = u, —3 = t2, 
2=; 
l = hit; whence l, =1 
2 = layty, + u22 whence 2. =5 
1 = dyytty3 + U23 whence 23 = —1 
=] = hitii whence /;, = —1 
6 
—3 = 15,2 + [32 u22 whence l = -3 
11 
—3 = 31u33 + l32U23 + U33 whence u33 = rise 


We have, therefore 


0 o7F1 -3 2 

1 O;;O 5 = 
“ -1 ae 1;//0 0 i 

F — 


All the figures can be found in our previous work (in sub-section 
8.2.1) on the matrix A, using Gauss elimination. 


We can now see quite easily when the method illustrated above will fail. 
It is clear that the product of successive leading submatrices * of L and U 
is equal to the successive leading submatrices of A, that is 


[Hiu]= [a] 
[ 1 | ie mal = ee | 
hi IJLO uz a23 22 


It follows that we cannot compute /,,; (and hence u32) unless u,, #0. If 
uy, #0, we can produce the second decomposition given above, but can 
go no further if u33 = 0. 


etc. 


Now wy, = 4, = det [a,,] 
a, a 
Uy, U2 = det | “1? 
a2 rz 


and we have therefore shown that the method will succeed only if successive 
leading submatrices of A are non-singular. We can complete the decom- 
position if the first »— 1 such matrices are non-singular, but if A is 
singular then the final element w,,, will be zero. 


* The meaning of the term feading submatrix should be clear from what follows. 


In fact the essential interchanges of sub-section 8.2.3 are designed so 
that some row-permuted form of A has non-singular leading sub- 
matrices, and it follows that if A itself is non-singular we can always form 
the triangular decomposition of this row-permuted A. If A is singular 
with rank  — 1 we can also perform the decomposition, though the last 
element of U will be zero. 


It is not quite obvious how this “interchanging” process can be incor- 
porated into the automatic computation of L and U, and we shall not go 
into details about this. The possibility, however, is revealed in the following 
exercise. 


Exercise 
OIII 
; : re 24 
Consider the matrix 4 = 2240 
1211 


In sub-section 8.2.3 we carried out on 4 an elimination process with inter- 
changes, first interchanging rows 1 and 2 and subsequently rows 3 and 4. 
Write down the matrix derived from A by making these interchanges in 
advance, compute the triangular decomposition of this matrix, and com- 
‘pare the results with those of the elimination method. 


Solution 


The row-permuted matrix and the triangular decomposition are 


1121 1000 ti 2 1 
O11 1)/_]/0100 0 1 1 1 
12211) /1 110 0 0-2 -1 
2240 2001 00 0-2 


and no i4, = 0. Observe that U is the same as the upper triangle 
of the elimination process with row interchanges. 


To complete the solution of linear equations, using the idea of triangular 
decomposition, we can write 


IA, b] = L[U, c), 


the vector c being obtained from b by the same process by which the 
columns of U are obtained from A. Alternatively, and equivalently, we 
can compute c, having found L, by forward substitution in the equations 
Le =b, written explicitly as 
c =b, 

liet c =b, 

l316; + l3262 + 63 = b3, etc. 
The vector ¢ is of course the right-hand vector which is produced from b 
in the standard Gauss elimination process. The final step in the solution 


obtains x from Ux = c by back-substitution, as in the Gauss elimination 
method, 


Exercise 
Solve the equations 
Xx, — 3x2 + 2x3 = —12 


x, + 2x2 + x; S 
—X, — 3x, -3x =- 4 


I 
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by computing L, U, and c from the equations [A, b] = L[U, c], and obtain 
x by back-substitution in Ux = c. Compare the results, at every stage in 
the computation, with those of the first example of sub-section 8.2.1. 
Solution 


We find 


[Z] [U, c] 
] 0 0 1 -3 2 -12 
_ l 1 0 0 5 -l 17 
> 6 11 22 
= E 1 s5 2 
5 9 9 5 5 


and the back-substitution follows as before. 


A noticeable feature of this compact elimination method is that we record 
only the elements of L, which are the negatives of the multipliers in the 
Gauss elimination process, the elements of U, which is the final upper 
triangular matrix U = A® of the Gauss elimination process, and the final 
right-hand vector c = b‘) of the Gauss elimination method. In other 
words, we apparently compute and certainly record only the pivotal rows 
of the Gauss elimination method. What, you might well ask, has happened 
to a non-pivotal computed row such as the third row of the matrix A 
and vector b??? An answer to this is indicated in the following exercise. 


Exercise 


In the previous exercise, the last element of the vector c, which is the 
element b$? of the Gauss elimination method, is obtained from the 
equation 


l31 Cy + l32 Cp + €3 = bz, 
that is 

C3 = b; — l3; cy — h32 C2. 
Verify that 

bz- hic = oY 


(of the Gauss elimination process). 


Solution 
bs = l3, & = —4— (-1)(- 12) = -16 = b®. 


In fact, all the unrecorded numbers of the Gauss elimination method are 
the partial sums of the formula for the final element in the relevant place 
in the array, The practical importance of this result will become apparent 
when we perform an error analysis of the Gauss elimination method. 
This we proceed to do, demonstrating in the process that the Gauss 
elimination method as we have described it, with interchanges only where 
essential, can, in practice, suffer from acute induced instability when we 
use inexact computer arithmetic. 
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8.2.6 Summary of Section 8.2 


In this section we defined the terms 


Gauss elimination method 

upper triangular matrix 
back-substitution 

Jordan method 

pivot 

essential row interchanges 

unit lower triangular matrix 
triangular decomposition of a matrix 
compact elimination method 


We introduced the notation 


L(r) 
U(s) 


Techniques 


(page C 12) 
(page C 12) 
(page C 12) 
(page C 17) 
(page C 18) 
(page C 20) 
(page C 21) 
(page C 21) 
(page C 25) 


(page C 12) 
(page C 12) 
(page C 12) 


, (page C 12) 


(page C 21) 
(page C 22) 
(page C 22) 


1. ı Solve Ax = b and invert A by the Gauss elimination method and by 


the Jordan method. 


RYN 
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Determine the triangular decomposition of A. 
Solve Ax = b by the compact elimination method. 
Evaluate det A by reducing A to triangular form. 


8.3 INDUCED INSTABILITY OF GAUSS 
ELIMINATION 


8.3.1 A Numerical Example 


Following our standard pattern, the next thing we should ask is whether 
or not the Gauss elimination method suffers from induced instability 
when, as is almost invariably the case, we have to use a computer ona 
practical problem. 


First we give an example to show that this is undoubtedly possible, and 
we use for this purpose a hypothetical four-decimal-digit floating-point 
computer. The equations are 
—1.414x, + X2 = 0.1000 
xı — 1.414x, + X3 = 0.1000 
Xz — 1.414x3 + X4 = 0.1000 
x3 — 1.414x, = 0.1000 
which may have come from a boundary-value problem involving a re- 


currence relation of second order, introduced in Unit 7, Recurrence 
. Relations. 


At each stage there is only one number to eliminate, and we can write 
down the upper triangular form in the process of computing it. The 
multipliers are written on the left, and we find 


0.1072 —1.414x, + x2 = 0.1000 
1.415 — 0.7068x, + x3 = 0.1707 
~ 1000 0.0010x;+ x4 = 0.3415 


=> 1001x, = —341.4 
Back-substitution gives 
X4 = 0.3411, x, = 0.4000, x, = 0.3244, x, = 0.1587. 
A glance at the original equations reveals a symmetry x; = X4, X2 = X3, 
and our answers are clearly far from the truth. The problem, incidentally, 
is not particularly ill-conditioned, and in fact the method has exhibited 


considerable induced instability: somewhat surprising, since this is really 
a very small problem! 


Let us do some non-essential interchanges, not theoretically necessary, and 
re-order the equations to read 


—1.414x, +x = 0.1000 
x — 1.414x3 + x4 = 0.1000 

x, — 1.414x, = 0.1000 

x, — 1.414x, +x% = 0.1000 
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It is perhaps now more convenient to record separately the successive 
“reduced” sets of equations. We have (with pivots emboldened) 


Multipliers Matrices Vectors 
— 1.414 1 0 0 0.1000 
0 1 —1.414 1 0.1000 
0 0 1 —1.414 0.1000 
0.7072 1 — 1.414 1 0 0.1000 
— 1.1414 1 0 0 0.1000 
0 1 —1.414 1 0.1000 
0 0 1 -1.414 0.1000 
0.7068 0 —0.7068 1 0 0.1707 
— 1.414 1 0 0 0.1000 
0 1 —1.414 1 0.1000 
0 0 1 —1.414 0.1000 
— 0.0006 0 0 0.0006 0.7068 0.2414 
— 1.414 1 0 0 0.1000 
0 1 —1.414 1 0.1000 
0 0 1 —1.414 0.1000 
0 0 0 0.7076 0.2413 


Back-substitution gives 
X4 = 0.3410, x, = 0.5822, x, = 0.5822, x, = 0.3410. 


Symmetry reveals that the answers are almost certainly correct to four 
decimal figures, and this can be checked by more accurate computation. 
We have eliminated the induced instability! 


8.3.2 Error Analysis of the Gauss Elimination Method 
(backward error analysis) 


In Unit 7, Recurrence Relations, we investigated the instability evidenced 
by computations with recurrence relations. For example, for the problem 
defined by 


Veri =a, Yp +b,  Yo= a, 


we observed that a local error is made in the floating-point arithmetic 
computation of any y,,, from a previously computed y,, and we analysed 
the effect of those errors to deduce the total error in the computed Day 
say. This total error was compared with the value of y, which would be 
produced by exact arithmetic. (We assumed, for this purpose, that all the 
data are known exactly and can be stored exactly.) 


This method of analysis, which gives an estimate of the error in the 
computed result, is called forward error analysis. It turns out that this is 
an extremely complicated operation when applied to the solution of linear 
equations. We prefer here the easier method of backward error analysis, 
whose aim is not to find the errors in the computed solutions but to 
discover what changes in the data would produce these solutions with 
exact arithmetic. We try to show that our method produces a solution 
vector X which, while not satisfying exactly the given equations Ax = b, 
will satisfy exactly the “ perturbed” equations (A + 5A)X = b + ôb. If the 
elements in the perturbations 6A* and ôb are small we have little induced 
instability, and any dissatisfaction with the solution can spring only from 
the possibility of inherent instability. By looking at the sizes of perturbations 
for different methods we evaluate our techniques, retaining those for which 


* Note that 64 is not “Stimes 4”. 
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6A and ôb have small elements relative to those of A and b respectively, 
and discarding those for which these quantities are, or can be, relatively 
large. 


The basic numerical operations in the Gauss elimination process are 

(Gi) the computation of a multiplier, such as m, = —42,/a,,, and 
(ii) the computation of a new element, such as a = a,, + 11240,2. 
In the computation (i) the machine makes a rounding error and stores 


Gay — 21411 


M2 mMm +e p +e 

1 21 T €21 _ 21 
ayy ay 

Hence the calculated multiplier would be the exact value, using exact 

arithmetic, if a, were replaced by az, — 221431- 


In the computation (ii) the machine tries to compute 


(2) = mn 
a32 = 422 + M21412; 


but only succeeds in computing and storing 


a$? = azz + M21032 + €22 = (a32 + €22) + 21442. 
It follows that the stored element would be the correct value, using exact 
arithmetic, if a23 were replaced by a2) + e32. 


We shall later want to find upper bounds for these “perturbations” e,,, 
but to see how it goes let us do an elimination using one-figure floating- 
point arithmetic, and compute the perturbed matrix which would give rise 
exactly to the computed and stored multipliers and upper triangular 
matrix. 


We take 
08 02 01 
A= 35 0.5 -02| 
07 03 09 
: 3 a .5 —0.7 
With a,,; = 0.8 as pivot, the multipliers are mz; = 08 and m3, = on": 
In our floating-point arithmetic, however, the machine stores mz; = 0.6 
and 3; = —0.9. These would be exact if we changed the a, term from 


—0.5 to —0.48, that is added 0.02 to it, and changed the a3, term to 0.72, 
which is again an addition of 0.02. 


Now consider the computation of the new a3). We have 
a = 0.5 + 0.6 x 0.2 = 0.62 = 0.6 + 0.02. 


We store 0.6, and this would be exact if we had changed the original az, 
from 0.5 to 0.48. 


The stored first step of the elimination giving A, and the changes 6,4 
in A which would produce everything exactly, are 


Ae 5,4 


08 02 O11 | 0.00 0.00 0.00 
0 06 —O1 | 0.02 —0.02 0.04 
0 O1 08 Í 0.02 —0.02 —0.01 


We can now ignore the first row and column, and repeat the procedure 
with the remaining rows and columns. 


Starting with 


0.6 —0.1 
0.1 0.8 


the stored multiplier is —0.2, which would be obtained exactly if the 
a¥) = 0.1 were changed to 0.12. 
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The new 


aÑ) = 0.8 — 0.2(—0.1) = 0.82 = 0.8 + 0.02 


and this would be produced exactly if we made a change of —0.02 in a}. 


Writing everything in full, we have the arrays: 


multipliers A 
0.8 0.2 01 
0.6 -05 05 —0.2 
—0.9 07 0.3 0.9 
A) 6,4 
0.8 0.2 Ol 0.00 0.00 0.00 
0 06 —O0.1 0.02 —0.02 0.04 
—0.2 0 0.1 0.8 0.02 —0.02 —0.01 
A® =U 6,4 
0.8 02 OI 0.00 0.00 0.00 
0 06 -O.1 0.00 0.00 0.00 
0 0o 08 0.00 0.02 —0.02 


The important fact is that these perturbations are additive; elimination 
with exact arithmetic would produce exactly all the multipliers and the 
final upper triangular form if we started with the matrix A + 5,4 + ô, A. 
Exercise 


The easiest way to check this last statement is to express it in terms of the 
triangular decomposition. Show that the stored L and U satisfy exactly 
the equation 


LU=A+5,A+6,4. 


Solution 
The matrix A + 5,A + 6,4 is 

08 02 O17 0.00 0.00 0.00 
—0.5 0.5 —0.2| + |0.02 -0.02 0.04 
07 03 09 0.02 —0.02 —0.01 

0.00 0.00 0.00 0.8 0.2 0.1 
+ [0.00 0.00 0.00}=|]-0.48 0.48 —0.16 
0.00 0.02 —0.02 0.72 0.30 0.87 


and it is easy to check that this is exactly the product LU of the 
stored triangular matrices 


1 0 0 08 02 Ol 
L= | —0.6 1 0|, U=]|0 0.6 -01| 


0 0 0.8 
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In general, if 4 is the largest possible absolute value of any element of the 
matrices ôA and ô, A and in the general case of the matrices 6,4, ô, A, 
«++, 6,-1A, then the pattern of perturbation in A, given by ðA +, A + 


++. +6,-14, is bounded by 


000 
I 1 
nl l 1 s|4y 
1 E 
0 0 
0 0 
0 0 
tajo o 
0 0 


the final element being y(n — 1). 


000 

000 

Lra 

Ie ght A 

0 

0 

0 

1 +e =n 
1 


- = — O 


NNO 


wnNnro 
WNr o 


The important point, of course, is the size of n, and this we evaluate with 
our rules for floating-point arithmetic. 


In the computation of the multiplier, M2, = —4>,/a,,, for example, we 


have 


= a 
Mz, = fl (a2, /a,5) => +e), 
u 


where e <2™‘ in a binary machine of word-length r binary digits. The 
perturbation in az, is then at most 2~'|a,,| in absolute value. 


Where the formation of a new element is concerned, for example 


(2) pa 
a) = azz + Mi2a2, 


2) 


a? = fl (a22 + M21032) 


= {a32 + M21412(1 + e,)} (1 + e2), 


the e, coming from a floating-point multiplication and the e, from a 
floating-point addition, and |e,|, |e.| <27'. 


The perturbation a3) — (a23 + i7i2,4;2) in a33 is then 


ay 


a2) 
a — 
i2 The, 


— 21 Ay2e, 


=(2 Ti 
= Ge, — M10126), 


where we neglect terms like (e3)? which are at most 2~?". The perturbation 


is then at most 


27a? + [m214321) 


in absolute value. 


These results give upper bounds for the elements of 6,4. For ô, A we 
have similar results, except that now the a,, in these expressions is re- 


placed by &{??, and so on. 


Our y, the largest of these elements therefore depends on (i) the sizes 
of the multipliers »2,,, and (ii) the sizes of the elements 
bations relative to the elements of the original A, which are obviously the 
important quantities, depend on the multipliers and the possible relative 
growths in the elements of successive “reduced” matrices A, 


=(k) 
Ars 


. The pertur- 


The failure of the first method for our example of sub-section 8.3.1 is 
now quite apparent. There was one very large multiplier of — 1000, and 
this also caused (as it always will) a large growth in one @,,, the last 
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element of the U matrix being — 1001. In the second method the multi- 
pliers never exceeded unity, the Z% did not grow in size as k increased, 
and the solution was very good. 


It is clear, incidentally, that the corresponding reduction of the right-hand 
vector b to its final form ¢ induces errors of exactly the same kind as those 
relevant to the last column of 4. We have therefore shown that the elimina- 
tion process for Ax = b reduces the problem exactly to the solution of 
UX = c, and that X is the exact solution of the perturbed problem 


(A + ôA) = b + ôb. 


The perturbations 5A and ôb can be arbitrarily large if any multipliers or 
any of the successive elements in A“ or b® are relatively large compared 
with those of A and b. 


Exercise 


In the first method of sub-section 8.3.1, two stages of the elimination 
produced the array 


A@) bo 

— 1.414 1 0 0 0.1000 

0 — 0.7068 1 0 0.1707 

0 0 0.0010 1 0.3415 

0 0 1 —1.414 0.1000 

1 

Show that the last stage of the elimination, which produces A“, alone 
induces in the coefficient a,, = —1.414, a perturbation of amount equal 


to nearly one-third of the value of a44, representing a large degree of 
induced instability. 


Are there any other contributions, in this example, to the perturbation 
of a44? 


Solution 


The multiplier at the next stage of the elimination is exactly — 1000, 
and the new value computed in the (4, 4) position is 


fi (—1000(1) — 1.414) = — 1001. 


The true value of this quantity is — 1001.414, and the error, the 
current contribution to the perturbation induced in a44, is —0.414, 
nearly one-third of a4, = — 1.414. 


There are no other contributions to the perturbation in a44, 
since a} is the same as a44 for k = 2 and 3. 


8.3.3 A Stable Form of Gauss Elimination 


In the elimination process using exact arithmetic, or in the corresponding 
calculation of the Hermite normal form, we had occasionally to inter- 
change rows to produce a non-zero pivot (essential interchanges). This 
device is used deliberately in Gauss elimination to ensure stability. 


Clearly we can make all the multipliers m,,<1 in absolute value by 
choosing as pivot the largest in absolute value of the possible candidates 
in the relevant column, and interchanging the rows to bring this element 
into the pivotal position. This also has a beneficial effect on the possible 
growth of the a. We cannot guarantee that they will not grow, but the 
growth is limited and in practice tends to be tolerable. 


With this method (Gauss elimination with interchanges, sometimes called 
pivoting), we can now give an absolute error bound for the induced 
perturbation. 
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If we look at the induced perturbations, take |m,s| <1, and let k be the 
largest in absolute value of all the elements in all the reduced matrices 
(including the original A), then, using the results established in sub-section 
8.3.2, we can now see that the pattern of perturbation is certainly bounded 
by 


0 
a 1 
2 t 
x2 'xk 2 


-= EO 
Wn oO 


0 
1 
2 
2 3 
That is, we have produced a U which would be obtained by exact arith- 


metic from the perturbed matrix A + ôA, where the elements of 5A are 
bounded as shown, and for which the largest coefficient is no greater than 


(5A) ny < 2(n ~ 1)27'k. 


Completing the story, by considering the effect on the right-hand vector b, 
the elimination has produced the upper triangular equations UX = c, and 
this would have been produced exactly from (4 + 6A)X = b + ôb, where 
the elements of maximum size in the perturbations are 


(5A) an < 2(n ~ 1)27'k, (ôb), < 2(n — 12741, 


where /is the largest in absolute value of all the elements on the right-hand 
side of the original or any “ reduced ” equations. 


Since the elimination operation involves roughly 4»? additions and multi- 
plications, this is an extremely satisfactory result. If k is not significantly 
larger than the original elements of A, then even for n = 10° the pertur- 
bation affects at most the last four decimal digits of our stored numbers. 
Quite commonly our binary machine stores the equivalent of ten or eleven 
decimal digits, so that these induced perturbations are likely to be far 
smaller, even for this large problem, than any inherent uncertainties in 
the data. In fact they will generally be very much smaller, since our bounds 
considerably over-estimate the precise perturbations. 


Exercises 


1. Consider the solution of linear equations in which the matrix is 


-/2 1 0 0 
Js 1 -/2 1 0 
= ‘ I =f if 

0 0 1 -./2 


By considering the values of the determinants of the first three principal 
submatrices of A, using exact arithmetic, show that when computer 
arithmetic is used some pivoting would be essential to avoid induced 
instability. 7 

(Hint Relate the determinants of the triangular decomposition of A, as 
discussed in sub-section 8.2.5.) 

2. By a similar process show that if the rows of A are permuted to the 
order 1, 3, 4, 2, then pivoting is likely to be unnecessary. 
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Solutions 


1. By relating the elimination to the triangular decomposition 
of A, we know that the diagonal elements of the U matrix 
satisfy the equations 


uy, = det [a,,] = =./2 
Uy, u22 = det i z = det ee 


1 
42, 422 A 


=2-1=1 


a Ayr 43 
Uii U22 U33 = det | az, a22 23 


a31 a32 33 


-/2 1 0 
= det 1 =/2 1 
0 1 -/2 


= —,/2(2 — 1) + 14/2) = 0. 


With computer arithmetic this computed value is unlikely to 
be zero but would be very small. The computed diagonal 
element u3, is therefore very small, producing a large next 
multiplier and hence induced instability. 

2. In the new ordering we have 


1 

1 

0 
1 -/2 1 0 
The determinants of the first three leading submatrices are 


-J2, -/2, -/2 
(the product of the diagonal terms, since all these matrices 


are upper triangular). None of these is small, so that there is 
no large multiplier and induced instability is unlikely. 


These results explain the phenomena of the example of 
sub-section 8.3.1. 


We should, of course, finally investigate whether there is any further 
instability in the solution by back-substitution of the equations UX =e. 
We merely state that we can use a similar process of backward error 
analysis to show that the finally computed solution X is the exact solution 
of another perturbed form 


(A + 5A)X =b + ôb, 


where, after pivoting in the elimination, the bounds on 6A and ôb are 
not more than twice those for the perturbations induced solely by the 
elimination. The stability of the complete Gauss elimination method is 
then assured. 


It is worth observing that if we used the 4 = LU decomposition, where A 
is the row permutation of 4 which guarantees that all the multipliers and 
therefore all the elements of L are numerically less than 1, then with just 
a little more accurate arithmetic we can reduce the perturbations even 
further, to spectacularly small values! 
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We observed at the end of sub-section 8.2.5 that the A = LU method 
produces u,,, for example, from the formula 


m = were 
A an = Unn = Gnn — lpi Uyn — Ino Un slant Unt 


and that the partial sums 
Ann i Li Hin 
(ann — lni tn) — laz Urn 
{lann = lai Uin) — n2 Man} — In3 Usp 


ete. 


are the elements a), a"), etc. in the successive “ reduced” matrices. Now, 
we have seen on page 33 that the perturbation induced in a,, by the 
elimination is something like 2 x 27! x k x (n — 1), and the factor n — 1 
comes from the fact that we actually evaluate Unn by forming each partial 
sum and rounding it before adding the next term. There are n — 1 such 
rounding errors. 


If it were possible, as it is in some machines, to evaluate u,, by doing the 
multiplications and additions accurately (in “double-length” arithmetic) 
and doing just one final rounding to single length (so that we are doing a 
sort of partial double-length arithmetic), then we remove the factor n — 1 

. and all similar factors in the perturbation array, and reduce the maximum 
perturbation to 2 x 2~' x k, which is very small and independent of n. 
This method, in fact, is used in the best computer programs when high 
accuracy is needed in the solution. 


But this raises another question. We may have a stable method, but can 
we say how good our computed solution is, and can we from our technique 
get any idea about the degree of ill-conditioning of the problem? This we 
discuss in the next section. 


8.3.4 Summary of Section 8.3 


In this section we defined the terms 


forward error analysis (page C 28) 
backward error analysis (page C 28) 
pivoting (page C 32) 
We introduced the notation 
6A (page C 28) 
ôb (page C 28) 
Mij (page C 29) 
ap (page C 31) 
Techniques 


1. Show that the computed solution of Ax = b is the exact solution X of 
the perturbed equations (A + 5A)X = b + ôb in which 5A and ôb can 
be large. 

2. Show that Gauss elimination with interchanges (pivoting) gives small 
perturbations 6A and ôb and therefore exhibits very little induced 
instability. 
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8.4 ACCURACY AND ILL-CONDIJTIONING 
8.4.1 Accuracy of Approximate Solution 


The backward error analysis essentially gives an evaluation of our fech- 
nique, but it does not give us an expression for the error in the computed 
solution of the linear equations. Any type of forward error analysis is 
virtually impossible, and in fact it is really rather difficult to get an accurate 
error bound for the solution. 


If we substitute the alleged solution X into the equations, and compute the 
residual vector r = b — AX, then the error in the solution is exactly Ar, 
that is x = X + A7'r. But the computation of A” ' is at least twice as labor- 
ious as the computation of X, and we would prefer to get some idea of 
the error without finding even an approximation to A~ +. 


If the problem is at all ill-conditioned, then even small elements in the 
residual vector r do not guarantee that X is a good approximation, since, as 
we have seen, A~'r could have larger components than X even if those of r 
are quite small. It can be proved, moreover, that if X is small then r is 
certainly small if we use our stable version of Gauss elimination. 


The only immediate information that the elements of the residual vector 
can provide is that we have not committed any serious blunders. On the 
other hand, in the process of computing X we have effectively performed 
the approximate triangular decomposition of A, and we can, therefore, 
without much extra labour, solve the equations Adx =r approximately 
to obtain a correction 6x = A~'r to X. 


Consider, for example, the equations 


0.5000x, + 0.3333x, + 0.2500x5 = 0.9500 
0.2500x, + 0.2000x, + 0.1667x3 = 0.5200 
0.3333x, + 0.2500x, + 0.2000x5 = 0.6700 


ordered so that no interchanges are needed in the elimination or triangular 
decomposition. The solution, correctly rounded to four decimals, is 


xy = 1.1887, x, = 0.6440, x, = 0.5641 


Now suppose we work with four-figure floating-point arithmetic. First we 
find the triangular decomposition, which turns out to be 


L U 
1 0 0 | 0.5000 0.3333 0.2500 
0.5000 1 0 0 0.03340 0.04170 
0.6666 0.8323 1 | 0 0 — 0.001310 


Next we find ¥ from the equations Le = b, UX = c, which give 


c =(0.9500, 0.045 00, —0.000 800 0) 
X = (1.205, 0.5847, 0.6107) 


and the approximate solution differs from the truth by a relative maximum 
of about 10%. 


Let us try to improve this approximation. We first compute exactly the 
residual vector 


r = b — Ax = (—0.000 055 51, 0.000 006 31, 0.000 058 50). 


We then round the elements of the residual vector to single-length floating- 
point form: 


F=(107* x —0.5551, 1075 x 0.6310, 1074 x 0.5850). 
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The rounding error is usually small, and here in fact zero, because the 


residual elements are themselves small, with several zeros after the decimal 
point. 


We now correct the solution, solving in obvious notation and with single- 
length arithmetic the equations Léc =F, Udx = dc, which give 


de = 10° *(—0.5551, 0.3407, 0.6714) 
ôx =(—107! x 0.1761, 107! x 0.6500, —10-! x 0.5125) 


The rounded better solution X + ôx is 
xı = 1.187, x, = 0.6497, x; = 0.5595, 


which now differs from the truth by a relative maximum of about 1%, 
which is a full decimal figure better than the first approximation. 


We can continue this process, first finding the exact new residual vector 
r = (0.000 079 99, 0.000 041 35, 0.000 047 90) 


and a new correction 5x. This we do not pursue. We note, however, the 
interesting fact that the new residual elements, for this much better 
approximation, are no smaller than those of the poor first approximation! 
This confirms the inadequacy of the residual vector as an indication of 
the accuracy of the approximate solution. 


It is important to realize that the process succeeds only if the residual 
elements are computed very accurately and if the resulting storage error 
is small. For otherwise the latter error, premultiplied by 4~!, could well 
be as large as our correction! 


It may be argued that this accurate computation of residual elements 
involves the use of “double-length” computer arithmetic, suggesting that 
the complete solution could be obtained quite accurately by using double- 
length arithmetic right from the start. The point is that double-length 
arithmetic is expensive in storage and in time. The computation of the 
residual vector involves only nê numerical operations, considerably fewer 
than the approximately 42° operations involved in elimination and back- 
substitution for the large values of which we meet in practice. We shall 
also see in sub-section 8.4.2 that this device has a further important benefit. 


We can go even further with this idea. If the problem is mathematical we 
have to use rounded data in the solution of the linear equations, but we 
might be able to compute the residual vector using a more accurate or 
even exact version of A. The correction could then be obtained as before. 


For instance, our stored matrix in the last example may be the rounded 
version of the true matrix 


k: il 
A=] 3]. 
3 4 3! 


We could obtain the first approximate solution as before, but the correctly- 
computed and rounded residual vector of our approximate solution is now 


r=1074(—0.7500, 0.2667, 0.1833). 

The correction vector obtained from this residual vector is 
ôx = (—0.005 212, 0.01613, —0.011 38) 

and the rounded corrected solution is therefore 


x =(1.200, 0.6008, 0.5993). 
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This has a maximum error of 0.0008 compared with the exact solution 
x = (1.200, 0.6000, 0.6000), 
whereas the first approximation has a maximum error of 0.0162. 


In all cases we can repeat the whole operation and expect to get even 
better answers. 


8.4.2 Practical Determination of Il-conditioning 


The method of correcting a first approximation which we have just de- 
scribed is also usually the most powerful method available for detecting 
the possibility of ill-conditioning. For, as we have seen in the error analysis 
of the Gauss elimination method, our first approximation X is the exact 
solution for a perturbed problem (A + 6A)x = b +ôb, where the pertur- 
bations 6A and ôb have known upper bounds. The corrected approxima- 
tion is effectively the exact solution of the original problem with smaller 
perturbations, and the number of figures to which our two approximations 
agree therefore gives useful information about the degree of ill-condi- 
tioning. 


In an ill-conditioned mathematical problem, in which the (storage) un- 
certainty in the original data is at least no greater than the perturbation 
induced by the method, examination of successive approximate solutions 
gives not only an idea about the ill-conditioning but also a reasonable 
guarantee of the number of figures we can quote as being correct in the 
solution. There may, of course, be some limit to this, because if we have 
to round the elements of the residual vector, there is an unavoidable error 
A~‘6ér. Even though ôr is of order 27% the elements of A~‘5r might, in 
extreme cases, affect some of even the first ¢ digits of the solution. 


For a physical problem, the real uncertainties in the data are, as we have 
seen, likely to be very much larger than the induced perturbations in the 
method. In that case we have a virtual guarantee that the number of 
meaningful figures in the result is not greater than the number of figures 
to which our first and second approximations agree. It may be very much 
smaller, but we have no easy way of deciding this. 


What we would really like to do is to find accurate intervals in which the 
x, certainly lie, corresponding to the different data intervals in which the 
a, and b, lie. This is an extremely difficult problem, and the subject of 
much current research. Nevertheless the partial solution of the last para- 
graph is already an important and useful result. 


8.4.3 Summary of Section 8.4 


In this section we defined the term 


residual vector - (page C 36) 
and introduced the notation for it 

r (page C 36) 
Techniques 


1. Compute the residual vector for an alleged solution X of Ax = b and 
hence determine 6x, a first correction to X. 


2. Use the size of 6x to determine the degree of ill-conditioning of the 
problem. 
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8.5 SUMMARY OF THE UNIT 


In this unit we have discussed and illustrated the following items. 


1. The fact that the solution of sets of linear algebraic equations can be 
very sensitive to small changes in the data, the coefficients of the matrix or 
the constant terms in the equations. In other words, that this problem can 
exhibit ill-conditioning. 

2. The uncertainties in the answers may be correlated, so that, for 
example, an allied problem such as the evaluation of some linear com- 
bination of the elements of the solution vector, may be quite well-con- 
ditioned. 

3. The elimination methods of Gauss and Jordan, and their relation with 
the production of the Hermite normal form, for solving systems of equa- 
tions or inverting matrices. 

4. The connection between the reduction of a matrix to upper triangular 
form, by elimination, and the expression of a matrix as the product of 
lower and upper triangular matrices. 

5. The effect of zero diagonal elements in the elimination method, the 
corresponding need for row interchanges, and the connection between this 
phenomenon and the singularity of leading principal sub-matrices. 

6. The fact that the elimination method can produce very poor results 
because the method can exhibit induced instability. 

7. An error analysis, relating the induced instability to the fact that the 
computed solution is the exact solution of a perturbed problem, in which 
the perturbations in the matrix and right-hand-vector can be very large. 
8. A method of avoiding the induced instability, giving rise to guaranteed 
stability, which merely involves selected row interchanges so that the 
largest element in the relevant column is on the diagonal. This is “ Gauss 
elimination with interchanges, or with pivoting”, and the error analysis 
gives a guaranteed small bound for the induced perturbations. 

9. That the error analysis involved, so-called “ backward” error analysis, 
gives an evaluation of the method but not a bound for the accuracy of 
the computed solution. 

10. A method for correcting the approximate solution, which involves 
little extra work and which also exhibits the degree of ill-conditioning of 
the problem. 


Definitions 


The terms defined in this unit and page references to their definitions are 
given below 


direct method (page C 6) 
iterative method ; (page C 6) 
physical problem . (page C 7) 
mathematical problem (page C 9) 
Gauss elimination method (page C 12) 
upper triangular matrix (page C 12) 
back-substitution (page C 12) 
Jordan method (page C 17) 
pivot (page C 18) 
essential row interchanges (page C 20) 
unit lower triangular matrix (page C 21) 
triangular decomposition of a matrix (page C 21) 
compact elimination method (page C 25) 
forward error analysis (page C 28) 
backward error analysis (page C 28) 
pivoting (page C 32) 
residual vector (page C 36) 
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Techniques 


k 


Solve Ax = b, given specified errors in 4 or b and hence say whether 
or not the problem is ill-conditioned. 


2. If such a problem is ill-conditioned determine whether the problem 
of finding f=; Cs Xs, given cj, €z,- -, Cp, is well-conditioned. 

3. Solve Ax = b and invert A by the Gauss elimination method and by 
the Jordan method. 

4. Determine the triangular decomposition of A. 

5. Solve Ax = b by the compact elimination method. 

6. Evaluate det A by reducing A to triangular form. 

7. Show that the computed solution of Ax = b is the exact solution X 
of the perturbed equations (4 + 5A)X =b + ôb in which 5A and 
ôb can be large. 

8. Show that Gauss elimination with interchanges (pivoting) gives small 
perturbations 6A and db and therefore exhibits very little induced 
instability. 

9. Compute the residual vector for an alleged solution X of Ax = b and 
hence determine ôx, a first correction to X. 

10. Use the size of ôx to determine the degree of ill-conditioning of the 
problem. 

Notation 
Mij (page C 12) 
p® (page C 12) 
Aw (page C 12) 
U (page C 12) 
L (page C 21) 
Lr) (page C 22) 
U(s) (page C 22) 
ôA (page C 28) 
ob (page C 28) 
Tiy (page C 29) 
at? (page C 31) 
r (page C 36) 
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8.6 SELF-ASSESSMENT 
Self-assessment Test 


This Self-assessment Test is designed to help you test quickly your under- 
standing of the unit. It can also be used, together with the summary of 
the unit for revision. The answers to these questions will be found on the 
next non-facing page. We suggest you complete the whole test before 
looking at the answers. 


1. Find, by the method of Jordan elimination, the inverse of each of 
the matrices 


1 09 p å 
aali ae asia 


2. We wish to solve the equations Ax = b + e, where e is a vector of 
uncertainties in the measured vector b, and each element of e is at 
most € in absolute value. 


For which of the matrices of Question 1 is the problem absolutely ill- 
conditioned, in the sense that the uncertainties in the components 
of the solution vector x are greater than those of the uncertainty 
vector e? 
1 

3. Which of the following problems are (a) absolutely ill-conditioned, 
and (b) relatively ill-conditioned, in the sense that the relative un- 
certainties in the components of the solution are greater than the 
relative uncertainties in the vector b? 


(i) x, +09x,=1+e 
X+ x, =-lt+e, 
Gi) x, +0.9x,=1+e, 
xt xX, =l1+e, 


where |e,|, |e2.| < 107%. 


4, In the problem (ii) of Question 3, show that all we can say about x; 
and x, is 


x, =1419 x 107% xz =0 F 20 x 107° 


but that, for the computation of y = x, + x2, we can assert with 
confidence that 


y=1+4107?. 


5. We wish to solve linear equations Ax = b, with 


1 09 1 
A= 1 1 2 
-1 01 3 


Is pivoting necessary in Gauss elimination to avoid induced in- 
stability? If it is, write down a matrix A, a row-permutation of A, 
for which pivoting is not necessary. 


6. From your work in Question 5, write down the determinant of the 
matrix A of that question. 


7. Carry out triangular decompositions of (i) the matrix A, in Question 1, 
and (ii) the row-permuted A of Question 5. 
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8. Working with one-digit floating-point arithmetic, the Gauss elimina- 
tion process for solving the equations 


0.8x, + 0.6x, = 0.2 
—0.5x, + 0.4x, = —0.8 
produces, in the process of elimination, the arrays 
multipliers A b 


0.8 0.6 0.2 
0.6 —0.5 06 0.2 


AD b2) 
0.8 0.6 0.2 
0 0.8 —0.7 


Write down a matrix A + 5A and a vector b + ob which, with exact 
arithmetic, would produce exactly the recorded multipliers, the 
recorded upper triangle A, and the recorded vector b® in this 
elimination. 


9. With the same floating-point arithmetic, the back substitution 
applied in Question 8 leads to the approximate solution 
x, = 0.9, x, = —0.9. 


Show that the true solution of the equation is x, + 6x, X2 + 6X2, 


where 
0.8 0.6] [5x,] _ [0.02 
—0.5 0.4} |x| ~ [0.01 
10. Compute the correction, ôx of Question 9, working with one-digit 
floating-point arithmetic, and verify that the corrected value is a 


better approximation to the true solution (which is x, = 0.903, 
x2 = —0.871, correct to three decimal places). 
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Solution to Self-assessment Test 


A A, 

1 09 1 0 1 1 0 

-1 ıl l o 1 -1 1 -3 0 1l 
-9 1 09 1 0 4 1 1 1 0 
0 0l -1 1l 0-4 -1 1 

1 0 10 —9 1 0 a4 

0 0.1 -1 1l 0 -4 -1 I 

I Ay? I Ax‘ 

1 0 10 -9 1 0 4 

o 1 --10 10 0 1 4—4} 


The uncertainty in the solution is ôx = A~‘e. For A,, we have (using 
the inverse in Question 1) 

dx, =10e, — 9e, 

ôx, = — 10e, + 10e,. 
If |e,|, [e2| <e, then the maxima of |6x,|, |6x2| are 19e and 20e. 


These are much greater than |e,| and |e,|, and the problem is 
absolutely ill-conditioned. 


For Az, we have 
ôx, = łe, + den 
ôx = te, — fer. 


The maxima of |6x,|, |6x2| are e and 4e, and the problem is abso- 
lutely well-conditioned. 
The matrix is A, of Question 1. Using its inverse, we find, for (i), 


l-l- al- tello Tell] 


The maximum values of the uncertainties are 19 x 107° and 
20 x 107? in absolute value. But these are only 1 in 10° of the values 
of x, and x, so that the problem, though absolutely ill-conditioned, 
is relatively well-conditioned. 


For (ii), we find 


11 = [1] uncertainti 
z| 710 rtainties. 


The maximum uncertainties are the same as before, but they are 
large relative to the value of x,, and the problem is ill-conditioned, 
both relatively and absolutely. 

The maximum uncertainties in part (ii) of Question 3 occur when 
e, = £1073, e, = £1079 and all we can then say is 


x,=1+419x 1073 
x, =0 F 20 x 1073, 


The value of 


x,+x,=[1 1] [k] 


= 1 + 10e, — 9e, 

=E 1 k = 10e, + i 

= 1 + (10e, — 9e,) + (~10e, + 10e,) 
=] +e, 
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and since |e,| < 107%, we can say that ¥=1+ 1073, a reasonably 
well-conditioned situation. 


The elimination goes as follows, with Pivots emboldened. 


multipliers A 
1 0.9 1 (No interchange needed here.) 
-1 1 1 2 
1 =] 01 3 
AD 


1 0.9 1 (There is a larger element in 
0 0.1 1 column 2 below the diagonal, 
0 1.0 4 interchange needed.) 


Interchange rows 2 and 3. 


multipliers A?) 
109 1 
0 10 4 
—0.1 0 01 1 
1 09 1 
0 1.0 4 (Final upper triangular form.) 
0 0 0.6 


Pivoting is therefore not necessary if we interchange rows 2 and 3 
before starting the elimination, that is, starting with 


The determinant of A in Question 5 is the product of the diagonal 
terms of the final upper triangular form, that is 


det A = 0.6. 
A is obtained from A with one row interchange, so 

det A = —0.6. 

L U 
ie 1 0.9) _ [1 Oļfun u2 
ttt 1] Uhr IJLO uz 

Then, using a,,= product of rth row of L with sth column of u, 
we find 

(1, 1) l =i; 

(1,2) 0.9 = u2 

(2, 1) 1 =l; uy, giving h, = l 

(2, 2) 1 = hy u2 + u22, Biving t22 = | — 0.9 = 0.1. 


We can use the same method for the matrix A of Question 5. But it 
is easier to use the relation between triangular decomposition and 
elimination (which we have already performed). This is 


1 0 0J [u 12 uiz 
A=|-my, 1 0 0 U22 u23 
=m; —mM32 1 0 0 33 
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Since the rows 2 and 3 haye been interchanged in the elimination 
involving A, however, we must interchange the multipliers — 72, and 
— m3, SO that they refer to their “ proper” rows. This gives 


L U 


1 0 O7f! 09 1 
A=|-1 1 0jj0 10 4], 
1 0.1 1)/[0 0 06 


which can be verified by direct matrix multiplication. 
The easiest way to find the perturbed matrix A + ôA is to find the 
matrix for which the stored L (obtained from the stored multipliers) 
and the stored U (the final stored upper triangle) is the exact 
triangular decomposition. That is, 
1 0]f0.8 0.6 0.8 0.6 

AFIA=LUS E | [ 0 oF = Es A j 

Similarly, 


1 of 02 02 
y (2) — = 
eee bas | Eg g fa : 


The elimination applied to 


0.8x, + 0.6x2 
—0.48x, + 0.44x, 


can then be performed exactly with one-figure floating-point decimal 
arithmetic. 


0.2 
—0.82 


ll 


Ifx= [=| , the computed first approximation, we find the residual 
vector 

r=b-— Ax 
The correct solution satisfies Ax = b, so that the correction ôx = x — X 
satisfies exactly the equation 

r = Ax — AX = A(x — È) = AOx. 
We find 


rn] _[ 02] [ 08 0.6 0.9] _ [0.02 

r —0.8 —0.5 0.4||—0.9]| [0.01]? 
and the equations given in the question are Adx = r. 
For the solution of the equations for the correction ôx, which we 
can only get approximately with the use of one-digit arithmetic, we 


use the same elimination method, noting that the “elimination in 
the matrix” has already been performed. We have 


(i) elimination Á r 
0.8 0.6 0,02 
0.6 —0.5 04 0.01 
0.8 0.6 0.02 
0 08 0.02 


(ii) back-substitution 
ôx, = fi (0.02/0.8) = 0.02, in our arithmetic. 


__ „ (0.02 — 0.6(0.02) 
ox =A ( 0.8 


The hoped-for better approximation is x, = 0.91, x. = —0.88. At 
least the maximum error has been reduced from 0.029 to 0.09. (But 
the roundings here are quite marginal. We could equally accurately 
take ôx, = 0.02/0.8 = 0.03 in our arithmetic; and then ôx, = 0. The 
corrected solution is then x, = 0.90, x, = 0.87, with errors of only 
0.003 and 0.001 respectively.) 


) = 0.01. 
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